-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge music branch to develop. #56
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ed labels (model output) to more verbose format usable by `export_music.py`.
…to define settings and `page_parser.py` to create music exporter object of `music/export_music/ExportMusicPage`.
…g box around polygon.
…t.py`. Get names only from Yolo `result.names`.
…music.py` a stand-alone script.
… text Layout engine to work only with 'text' lines.
…h with its own setting and set of categories to work with.
# Conflicts: # pero_ocr/core/layout.py
Parameter sets if PageOCR should update to new line: - every time (false) - only if better confidence (true) Applies in case of rerunning OCR on previously transcribed line)
TODOs from MartinK.
Results from
|
- Versions older than 4.2 defines baseline as a simple float. (that's where the original baseline comes from) - version 4.2 and never defines baseline as a PointsType string with recommend format: "x1,y1 x2,y2 ..."
…port options. - Versions older than 4.2 defines baseline as a simple float. (baseline is exported as mean of all Y baseline points) - version 4.2 and never defines baseline as a PointsType string with recommend format: "x1,y1 x2,y2 ..."
Old XMLs on input don't have category => line.category = None, OCR (and others) have to be set to `[]` by default to process ALL PAGES.
1) Remove `ultralytics` and `music21` from dependencies for the whole projest. the user will have to install them when needed. 2) Import `ultralytics` only when needed, so it doesn't create import error for specific numpy versions. Ultralytics has this dependency right now: "numpy>=1.23.5,<2.0.0". See current at [github.com/ultralytics/ultralytics/blob/main/pyproject.toml](https://github.com/ultralytics/ultralytics/blob/69cfc8aa228dbf1267975f82fcae9a24665f23b9/pyproject.toml#L67)
@vlachvojta I have found few bugs that shold be resolved:
|
In `smart_sorter.py`: - if less then to engines filtered, return original page_layout and not only the split one. In `music structures.py`: - change type of `lengths` to numpy array, fix min_length to take from numbers and not names. - ensure `encoded_group` is not None before appending it to the voice. full comment: [pero-ocr/pull/56/#issuecomment-2245202776](#56)
…whole region to positive or negative (ignore categories of lines inside the region)
Export multirest as a simple default 'whole' rest.
…on category, None = 'text')
…fidence -- in case when there are no logits (i.e. logits.shape[0] == 0) the confidence cannot be calculated.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary of changes in pull request #56 music->develop
parse_folder.py
YOLO + Non-text regions
LayoutExtractorYolo
as a layout parser capable of using YOLO model to detect "non-text" regions (e.g. images, tables, etc.)category
attribute toRegionLayout
andTextLine
storing YOLO output classes ortext
for original layout parsers)LayoutExtractorYolo
and YOLO inference, you have to installultralytics
library manually (as it creates library version problems when imported with other libraries)See code in:
pero_ocr/document_ocr/page_parser.py/LayoutExtractorYolo
Export + refactor of AltoXML and PageXML
v2.x
baseline is only float (mean of Y baseline coords)v4.4
baseline is a string inx1,y1 x2,y2 ...
format (same as PageXML)See code in:
pero_ocr/core/layout.py/*.(to|from)_(page|alto)xml()
Music (optical music recognition)
music
folder.user_scripts/export_music.py
Changes in .ini config for
parse_folder.py
Section names
Allow user to define more than one
Layout parser
,Line Cropper
andOCR
sections with new format:LAYOUT_PARSER_\d+
orLAYOUT_PARSER
(updated)LINE_CROPPER_\d+
orLINE_CROPPER
OCR_\d+
orOCR
See code in:
pero_ocr/document_ocr/page_parser.py/PageParser.init_config_sections()`
Execution order example
Sections are executed in alphabetical order (first executing all Layout parsers and then pairs of
Line Cropper
with correspondingOCR
).LAYOUT_PARSER_2
LAYOUT_PARSER_3
LINE_CROPPER_1
LINE_CROPPER_2
OCR_1
OCR_2
==>
LAYOUT_PARSER_2
LAYOUT_PARSER_3
LINE_CROPPER_1
OCR_1
LINE_CROPPER_2
OCR_2
See code in:
pero_ocr/document_ocr/page_parser.py/PageParser.process_page()`
New attributes
LINE_CATEGORIES
(list, [] by default) attribute forLAYOUT_PARSER
:RegionLayout
object (byLayoutExtractorYolo
) of somecategory
, create alsoTextLine
objects with the samecategory
(otherwise leave emptyRegionLayout
with noTextLine
objects).TextLine
objects formusic
regions but doesn't make sense for non-text regions (e.g.images
) as they don't have transcriptions.See code in:
pero_ocr/document_ocr/page_parser.py/LayoutExtractorYolo.process_page()
CATEGORIES
(list, [] by default) attribute forLINE_CROPPER
andOCR
sections:LINE_CROPPER
andOCR
engines only onTextLine
objects with these categories (or [] for all)process_page
call usingsplit_page_layout_by_categories
andmerge_page_layouts
See code in:
pero_ocr/layout_engines/layout_helpers.py/split_page_layout_by_categories()
SUBSTITUTE_OUTPUT
(bool,yes
by default) attribute forOCR
section:yes
, substitute output of OCR engine usingoutput_substitution_table
inOCR_JSON
(OCR engine configuration) using dictionary substitution key->value.See code in:
pero_ocr/document_ocr/page_parser.py/PageOCR.substitute_transcriptions()
SUBSTITUTE_OUTPUT_ATOMIC
(bool,no
by default) attribute forOCR
section:yes
, translation is done in atomic way on a page level: either all lines are translated or none.no
, translation is done in best-effort way: lines are translated independently and if some line fails, it is left untranslated.See code in:
pero_ocr/document_ocr/page_parser.py/PageOCR.substitute_transcriptions()
UPDATE_TRANSCRIPTION_BY_CONFIDENCE
(bool,no
by default) attribute forOCR
section:yes
, update line transcription only if the new transcription has higher confidence.no
, update transcription always.See code in:
pero_ocr/document_ocr/page_parser.py/PageOCR.process_page()
... see commit messages for more ...